DPF System Installation
This section involves creating the DPF system components and some basic infrastructure required for a functioning DPF-enabled cluster.
The files define the DPFOperatorConfig to install the DPF System components, and the DPUCluster to serve as the Kubernetes control plane for DPU nodes.
manifests/02-dpf-system-installation/operatorconfig.yaml
---
apiVersion: operator.dpu.nvidia.com/v1alpha1
kind: DPFOperatorConfig
metadata:
name: dpfoperatorconfig
namespace: dpf-operator-system
spec:
kamajiClusterManager:
disable: false
provisioningController:
bfbPVCName: bfb-pvc
installInterface:
installViaRedfish:
# set this
to the IP of one of your control plane node + 8080
bfbRegistryAddress: "10.0.110.1:8080"
dmsTimeout: 900
staticClusterManager:
disable: false
networking:
controlPlaneMTU: 9216
highSpeedMTU: 9216
manifests/02-dpf-system-installation/dpucluster.yaml
---
apiVersion: provisioning.dpu.nvidia.com/v1alpha1
kind: DPUCluster
metadata:
name: dpu-cplane-tenant1
namespace: dpu-cplane-tenant1
spec:
type: kamaji
maxNodes: 10
version: v1.30.2
clusterEndpoint:
# deploy keepalived instances on the nodes that match the given nodeSelector.
keepalived:
# interface
on which keepalived will listen. Should be the oob interface
of the control plane node.
interface
: $DPUCLUSTER_INTERFACE
# Virtual IP reserved for
the DPU Cluster load balancer. Must not be allocatable by DHCP.
vip: $DPUCLUSTER_VIP
# virtualRouterID must be in range [1
,255
], make sure the given virtualRouterID does not duplicate with any existing keepalived process running on the host
virtualRouterID: 126
nodeSelector:
node-role.kubernetes.io/control-plane: ""
Create a namespace for the Kubernetes control plane of the DPU nodes:
Jump Node Console
$ kubectl create ns dpu-cplane-tenant1
Apply the previous YAML files:
Jump Node Console
$ cat manifests/02-dpf-system-installation/operatorconfig.yaml | envsubst | kubectl apply -f -
$ cat manifests/02-dpf-system-installation/dpucluster.yaml | envsubst | kubectl apply -f -
Verify the DPF system by ensuring that the provisioning and DPUService controller manager deployments are available, all other deployments in the DPF Operator system are available, and that the DPUCluster is ready for nodes to join.
Jump Node Console
## Ensure the provisioning and DPUService controller manager deployments are available.
kubectl rollout status deployment --namespace dpf-operator-system dpf-provisioning-controller-manager dpuservice-controller-manager
deployment "dpf-provisioning-controller-manager" successfully rolled out
deployment "dpuservice-controller-manager" successfully rolled out
## Ensure all other deployments in the DPF Operator system are Available.
kubectl rollout status deployment --namespace dpf-operator-system
deployment "dpf-provisioning-controller-manager" successfully rolled out
deployment "dpuservice-controller-manager" successfully rolled out
## Ensure the DPUCluster is ready for nodes to join.
kubectl wait --for=condition=ready --namespace dpu-cplane-tenant1 dpucluster --all
deployment "dpf-operator-argocd-applicationset-controller" successfully rolled out
deployment "dpf-operator-argocd-redis" successfully rolled out
deployment "dpf-operator-argocd-repo-server" successfully rolled out
deployment "dpf-operator-argocd-server" successfully rolled out
deployment "dpf-operator-controller-manager" successfully rolled out
deployment "dpf-operator-kamaji" successfully rolled out
deployment "dpf-operator-maintenance-operator" successfully rolled out
deployment "dpf-operator-node-feature-discovery-gc" successfully rolled out
deployment "dpf-operator-node-feature-discovery-master" successfully rolled out
deployment "dpf-provisioning-controller-manager" successfully rolled out
deployment "dpuservice-controller-manager" successfully rolled out
deployment "kamaji-cm-controller-manager" successfully rolled out
deployment "static-cm-controller-manager" successfully rolled out